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Summary. The standard approach to Monte Carlo simulations of SU{N) Yang- 
Mills theories updates successive SU{2) subgroups of each SU{N) link. We follow 
up on an old proposal of Creutz, to perform overrelaxation in the full SU{N) group 
instead, and show that it is more efficient. 

The main bottleneck in Monte Carlo simulations of QCD is the inclusion of light 
dynamical fermions. For this reason, algorithms for the simulation of Yang-Mills 
theories have received less attention. The usual combination of Cabibbo-Marinari 
pseudo-heatbath [1] and Brown-Woch microcanonical overrelaxation [2] of SU{2) 
subgroups is considered satisfactory. However, the large-V limit of QCD presents a 
different perspective. Fermionic contributions are suppressed as 1/N, so that study¬ 
ing the large-Y limit of Yang-Mills theories is interesting in itself. High precision 
is necessary to isolate not only the V —> oo limit, but also the leading 1/N correc¬ 
tion. Such quantitative studies by several groups are underway [3]. They show that 
dimensionless combinations of the glueball masses, the deconhnement temperature 
Tc, and the string tension a approach their N —>■ oo limit rapidly, with rather small 
corrections ~ 1/V^, even down to V = 2. The prospect of making 0{1/N^) ~ 10%, 
or even 0{1/N) ~ 30% accurate predictions for real-world QCD is tantalizing. Nu¬ 
merical simulations can guide theory and help determine the N = oo “master field”. 

Already, an old string prediction Tc/y/a{N = oo) = first dismissed by the 

author himself because it disagreed with Monte Carlo data at the time [4], appears 
to be accurate to within 1% or better. Proposals about the force between charges of 
k units of Zn charge, so-called fc-string tensions, can be confronted with numerical 
simulations, which may or may not give support to connections between QCD and 
supersymmetric theories [5]. Efficient algorithms for SU(N) Yang-Mills theories are 
highly desirable. 

Here, we revive an old, abandoned proposal of Creutz [6], to perform overrelax¬ 
ation in the full SU{N) group, and show its superiority over the traditional SU{2) 
subgroup approach"* . 

^ Global updates, of Hybrid Monte Carlo type, are not considered here, because 
they were shown to be 0(100) times less efficient than local ones for SU{3) in [7]. 
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1 State of the Art 

We consider the problem of updating a link matrix U £ SU{N), from an old value 
[fold to [/new, according to the probability density 

P([/) oc exp(/?^ReTrX''‘[/) . (1) 

■^ReTrX^f/ is the “local action”. The matrix X represents the sum of the “sta¬ 
ples”, the neighboring links which form with U a closed loop contributing to the 
action. This is the situation for the Wilson plaquette action, or for improved actions 
(Symanzik, Iwasaki, ...) containing a sum of loops all in the fundamental represen¬ 
tation. Higher representations make the local action non-linear in U. This typically 
restricts the choice of algorithm to Metropolis, although the approach below can 
still be used to construct a Metropolis candidate (as e.g. in [8]). Thus, X is a sum 
of SU{N) matrices, i.e. a general N x N complex matrix. 

Three types of local Monte Carlo algorithms have been proposed: 

• Metropolis: a random step R in SU{N) is proposed, then accepted or rejected. 
Thus, from [/old, a candidate [/new = RUoid is constructed. To preserve detailed 
balance, the Metropolis acceptance probability is 

Pace = min(l,exp(/3^ ReTrX''‘([/new - [/old))) . (2) 

Acceptance decreases as the stepsize, measured by the deviation of R from the iden¬ 
tity, increases. And an N x N matrix multiplication must be performed to construct 
Pnew, which requires 0{N^) operations. This algorithm is simple but inefficient, be¬ 
cause the direction of the stepsize is random. By carefully choosing this direction, a 
much larger step can be taken as we will see. 

• Heatbath: a new matrix Pnew is generated directly from the probability den¬ 

sity P(P) Eq.(l). This is a manifest improvement over Metropolis, since Pnew is 
completely independent of Poid- However, sampling P(P) requires knowledge of the 
normalization on the right-hand side of Eq.(l). For SU (2), the simple algorithm of [9] 
has been perfected for large /? [10]. For SU{3), a heatbath algorithm also exists [11], 
although it can hardly be called practical. For SU{N), N > 2, one performs instead 
a pseudo-heatbath [1]. Namely, the matrix Poid is multiplied by an embedded SU{2) 
matrix R = ljv -2 <8) Rsu( 2 ), chosen by SU{2) heatbath from the resulting proba¬ 
bility oc exp(/?-P Re Tr(X/Poid)P). Note that computation of the 4 relevant matrix 
elements of (X/Poid) requires 0{N) work. To approach a real heatbath and decrease 
the correlation of Pnew = PoidR with Poid, a sequence of SU{2) pseudo-heatbaths 
is usually performed, where the SU{2) subgroup sweeps the natural choices 

of off-diagonal elements of P. The resulting amount of work is then 0{N^), which 
remains constant relative to the computation of X as X increases. 

• Overrelaxation. Adler introduced stochastic overrelaxation for multi-quadratic 
actions [12]. The idea is to go beyond the minimum of the local action and multiply 
this step by cd £ [1, 2], “reflecting” the link Poid with respect to the action minimum. 
This results in faster decorrelation, just like it produces faster convergence in lin¬ 
ear systems. In fact, as in the latter, infrared modes are accelerated at the expense 
of ultraviolet modes, as explained in [13]. The overrelaxation parameter w can be 
tuned. Its optimal value approaches 2 as the dynamics becomes more critical. In 
this limit, the UV modes do not evolve and the local action is conserved, making 
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Fig. 1. SU{N) overrelaxation acceptance versus N. 


the algorithm microcanonical. In practice, it is simpler to fix lj to 2, and alternate 
overrelaxation with pseudo-heatbath in a tunable proportion (typically 1 HB for 
4-10 OR, the latter number increasing with the correlation length). In 517(2), this 
strategy has been shown to decorrelate large Wilson loops much faster than a heat- 
bath [14], with in addition some slight reduction in the amount of work. It is now the 
adopted standard. For SU{N), one performs u = 2 microcanonical overrelaxation 
steps in most or all 517(2) subgroups, as described in [2]. 

The 517(2) subgroup overrelaxation of Brown and Woch is simple and elegant. 
Moreover, it requires minimal changes to an existing pseudo-heatbath program. 
But it is not the only possibility. Creutz [6] proposed a general overrelaxation in the 
SU{N) group. And Patel [15] implemented overrelaxation in 517(3), whose efficiency 
was demonstrated in [7]. Here, we generalize Patel’s method to 517(A). 


2 SU{N) Overrelaxation 

It may seem surprising at first that working with 517(A) matrices can be as efficient 
as working on 517(2) subgroups. One must bear in mind that the calculation of the 
“staple” matrix X requires 0{N^) operations, since it involves multiplying A x A 
matrices. The relative cost of updating U will remain bounded as A increases, if 
it does not exceed 0{N^) operations. An update of lesser complexity will use a 
negligible fraction of time for large A, and can be viewed as a wasteful use of the 
staple matrix X. Therefore, it is a reasonable strategy to spend 0{N^) operations 
on the link update. A comparison of efficiency should then be performed between (i) 
an update of all SU{2) subgroups, one after the other, following Cabibbo- 

Marinari and Brown-Woch; (ii) a full 517(A) update, described below, involving a 
polar decomposition of similar 0{N^) complexity. One may still worry that (ii) is 
unwise because the final acceptance of the proposed Anew will decrease very fast 
as A increases. Fig. 1 addresses this concern: the acceptance of our SU{N) update 
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scheme decreases in fact very slowly with N, and remains almost 1 for all practical 
N values. 

We now explain how to perform SU{N) overrelaxation, along the lines of [ 6 ]. The 
idea of overrelaxation is to go, in group space, in the direction which minimizes the 
action, but to go beyond the minimum, to the mirror image of the starting point. If 
X is the SU{N) group element which minimizes the action, then the rotation from 
C/oid to X is (Xt/^j). Overrelaxation consists of applying this rotation twice: 

t/„ew = {XU-ilfUold = Xt/JijX (3) 

[/new should then be accepted with the Metropolis probability Eq.(2). The transfor¬ 
mation Eq.(3) from Uoid to [/new is an involution (it is equal to its inverse). Erom 
this property, detailed balance follows. 

Note that this holds for any choice of X which is independent of [/new/oid, result¬ 
ing always in a valid update algorithm. Its efficiency, however, depends on making 
a clever choice for X. The simplest one is X = 1 VX, but the acceptance is small. 
Better alternatives, which we have tried, build X from the Gram-Schmidt orthogo- 
nalization of X, or from its polar decomposition. We have also considered applying 
Gram-Schmidt or polar decomposition to X^ or to X*. In all cases, a subtle issue 
is to make sure that [/new is indeed special unitary (det[/new = 1 ), which entails 
cancelling in X the phase usually present in detX. The best choice for X balances 
work, Metropolis acceptance and effective stepsize. Our numerical experiments have 
led us to the algorithm below, based on the polar decomposition of X, which comes 
very close to finding the SU{N) matrix which minimizes the local action. Note that 
Narayanan and Neuberger [16] have converged independently to almost the same 
method (they do not take Step 3 below). 

2.1 Algorithm 

1. Perform the Singular Value Decomposition (SVD) of X: X = USV\ where U 
and V e U{N), and E is the diagonal matrix of singular values ai (ui = \/A7, 
where the Ai’s are the eigenvalues of the non-negative Hermitian matrix X^X). It 
is simple to show that W = [/V^ is the U{N) matrix which maximizes ReTrX^IT. 
Unfortunately, detUV^ 7 / 1. 

2 . Compute detX = pexp(i((>). The matrix Xjvjv = exp{—i-^)UV^ is a suitable 
SU{N) matrix, adopted by Narayanan and Neuberger [16]. 

3. Find an approximate solution {6i} for the phases of the diagonal matrix D = 
diag(exp(i0i),.., exp(i 6 jv)), 6i — 0 mod 2tv, to maximize ReTrX^(exp(—i.^)[/I/U^). 
To find an approximate solution to this non-linear problem, we assume that all 
phases 6i are small, and solve the linearized problem. 

4. Accept the candidate [/new = XU^^^X, where X = exp{—i-^)UDV\ with prob¬ 
ability Eq.(2) ®. 

2.2 Efficiency 

We set out to compare the efficiency of the algorithm above with that of the standard 
SU{2) subgroup approach, as a function of N. Going up to X = 10 forced us to 

® This corresponds to an overrelaxation parameter w = 2. It may be possible to 
make the algorithm more efficient by tuning w, using the LHMC approach of [17]. 
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Fig. 3. Distance in group space after one link update, (1 — ^ Re Tr UnevrUoid) vs N. 


consider a very small, 2'^ system only. We chose a fixed’t Hooft coupling g^N = 8/3, 
so that P = ^ This choice gives Wilson loop values varying smoothly 

with N, as shown in Fig. 2, and is representative of current SU{N) studies. The 
Metropolis acceptance, as shown in Fig. 1, remains very close to 1. It increases with 
the’t Hooft coupling. 

A first measure of efficiency is given by the average stepsize, i.e. the link change 
under each update. We measure this change simply by (1 — ReTr DnewC^oid). 
The SU{N) overrelaxation generates considerably larger steps than the SU{2) sub¬ 
group approach, as visible in Fig. 3. The real test, of course, is the decorrelation 
of large Wilson loops. On our 2“* lattice, we cannot probe large distances. Polyakov 
loops (Fig. 4, left) show critical slowing down as N increases, with a similar ex- 
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Fig. 4. Autocorrelation time of Polyakov loop (left) and of (S'timeiike — 5spaceiike) 
(right) versus N, for the two algorithms. 


ponent ~ 2.8 using either update scheme. The SU{N) strategy gives a speedup 
0(3), more or less independent of N. One observable, however, indicates a differ¬ 
ent dependence on N for the two algorithms. That is the asymmetry of the action, 
ReTr(Plaq*'™®*'*'® — Plaq'’*’'^'^®*'*'®)). Fig. 4, right, shows that the speedup pro¬ 
vided by the SU{N) overrelaxation grows like ~ jsjo.ss ^ While this may be atypical, 
we never observed a slower decorrelation in the SU(N) scheme for any observable. 

In conclusion, overrelaxation in the full SU{N) group appears superior to the 
standard SU{2) subgroup approach. The results of [7] already indicated this for 
SU{3). Our tests presented here suggest that the advantage grows with N, at least 
for some observables. For S[/(4) in (2-1-1) dimensions [18], the decorrelation of the 
Polyakov loop was ~ 3 times faster in CPU time, using SU{N) overrelaxation, 
although our code implementation used simple calls to LAPACK routines, which 
are not optimized for operations on 4x 4 matrices. We definitely recommend SU{N) 
overrelaxation for large-simulations. 
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